[#38390] Bug fix for Datetime64Formatter with values of ndim > 1 #38391

BryanCutler · 2020-12-09T19:28:54Z

closes BUG: ExtensionArray with 2D datetime64 values errors on display formatting #38390
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This change fixes a bug in Datetime64Formatter that errors with values of ndim > 1. This was discovered when using an ExtensionArray with 2D values of datetime64 and attempting to format/display. The formatter currently allows for 2D values, but will end up returning nested strings instead of a flat list of string. The same error occurs when using the Datetime64Formatter directly on 2D values.

The fix is to flatten the values to get a flat list of formatted datetime64 strings, then reshape, and then format a second time over the outer dimension to produce the properly nested list format and the final flat list of strings.

jbrockmendel · 2020-12-09T20:13:44Z

pandas/tests/io/formats/test_format.py

        result = formatter.get_result()
        assert result == ["10:10", "12:12"]

+    def test_datetime64formatter_2d_array(self):


do we need any tz-aware cases?

Yes, good idea. It's likely to fail the same way so I'll add that too.

BryanCutler · 2020-12-09T22:47:56Z

pandas/io/formats/format.py

-        if not isinstance(values, DatetimeIndex):
-            values = DatetimeIndex(values)
+        if not isinstance(flat_values, DatetimeIndex):
+            flat_values = DatetimeIndex(flat_values)


DatetimeIndex appears to only support 1-d values. It will actually construct with dim > 1 but can error at some places, so pass flattened values just to be safe.

could use DatetimeArray?

Yeah, that is a little cleaner since DatetimeArray._format_native_types is the only thing being used here. It still might be good to call values.ravel() here since it's needed if self.formatter is set, although not really necessary otherwise.

BryanCutler · 2020-12-09T22:49:21Z

pandas/io/formats/format.py

+            flat_str_values = np.array([self.formatter(x) for x in flat_values])
+            fmt_values = flat_str_values.reshape(values.shape)
+        else:
+            fmt_values = flat_values._data._format_native_types(


This will actually call ravel()/reshape() itself, but flattened values are needed above so easier to use that here as well.

BryanCutler · 2020-12-15T08:22:38Z

Thanks for the review @jbrockmendel , sorry for the delay. I updated with a fix and tests for Datetime64TZFormatter as well, since the same bug happens.

BryanCutler · 2020-12-15T22:01:51Z

The remaining test failures look like they are unrelated from a network timeout

pandas/io/formats/format.py

doc/source/whatsnew/v1.2.0.rst

jbrockmendel · 2020-12-19T21:25:04Z

pandas/io/formats/format.py

-            values = DatetimeIndex(values)
+        values = np.asarray(self.values)
+        flat_values = values.ravel() if values.ndim > 1 else values
+        flat_values = DatetimeArray(flat_values)


DatetimeArray can handle 2D values, so some of the ravel/ndim checks might not be needed

Yes I noticed it's able to handle 2D values, but the extension array that I was fixing this for can have higher dimensions, so it would still need to check above that plus when self.formatter is set. It seems cleaner to flatten at the beginning and work with flattened values. And I think calling ravel() a second time would be a no-op?

OK no problem

jreback · 2020-12-22T14:37:25Z

doc/source/whatsnew/v1.3.0.rst

 - Bug in :func:`read_csv` not accepting ``usecols`` with different length than ``names`` for ``engine="python"`` (:issue:`16469`)
 - Bug in :func:`read_csv` raising ``TypeError`` when ``names`` and ``parse_dates`` is specified for ``engine="c"`` (:issue:`33699`)
 - Allow custom error values for parse_dates argument of :func:`read_sql`, :func:`read_sql_query` and :func:`read_sql_table` (:issue:`35185`)
+- Bug in :class:`Datetime64Formatter` that caused error on string representation with extension types of datetime64 values and ndim > 1 (:issue:`38390`)


this is not user facing (as its internal), is there anything that is?

The bug was found when calling repr() on an ExtensionType, I will put this in terms of that instead.

jreback · 2020-12-22T14:37:51Z

pandas/io/formats/format.py

-        if not isinstance(values, DatetimeIndex):
-            values = DatetimeIndex(values)
+        values = np.asarray(self.values)
+        flat_values = values.ravel() if values.ndim > 1 else values


can we use the @ravel_compat here?

Yeah, I think that would work. The only thing is that if self.formatter is set, then it would add an unnecessary conversion to an numpy.ndarray and then back again to list. I'm not sure how crucial that case is, would you like me to go ahead and change it anyway?

jreback · 2020-12-22T14:38:54Z

pandas/io/formats/format.py

        if self.formatter is not None and callable(self.formatter):
-            return [self.formatter(x) for x in values]
+            fmt_values = [self.formatter(x) for x in flat_values]
+        else:


why are we not using just the _format_native_types here? this makes this way more complicated.

Sorry, which line are you referring to? Did you mean instead of calling self.formatter?

I think the issue is that self.formatter may be user-supplied at some point in the process?

jreback · 2020-12-22T14:40:05Z

more importantly, shouldn't this actually be handled on the EA itself?

BryanCutler · 2020-12-28T19:38:41Z

Thanks for reviewing @jreback , I just had a couple questions.

more importantly, shouldn't this actually be handled on the EA itself?

This would be preferred, but I could not find a way to do this. When calling repr() on a Series wrapping my ExtensionArray, the formatting goes like this:

Series sees the dtype is an ExtensionType and ends up in ExtensionArrayFormatter._format_strings()
ExtensionArrayFormatter._format_strings() converts array to numpy.ndarray
Datetime64Formatter formats the ndarray into a list of strings and returns

Is there a way for the ExtensionArray to tie into this somehow?

github-actions · 2021-01-28T00:17:52Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

BryanCutler · 2021-01-28T18:54:51Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I am still planning on working this. I noticed there is a similar problem with extension arrays with floating point values of ndim > 1, so I will work on a fix that will hopefully handle both of these cases soon.

github-actions · 2021-03-07T00:14:28Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

BryanCutler · 2021-03-09T07:25:14Z

I haven't had the time to get back to this, will close for now and reopen when I can update

BryanCutler mentioned this pull request Dec 9, 2020

2+ dimensional tensors of timestamps crash pd.Series.repr() CODAIT/text-extensions-for-pandas#151

Closed

jbrockmendel reviewed Dec 9, 2020

View reviewed changes

BryanCutler commented Dec 9, 2020

View reviewed changes

BryanCutler added 10 commits December 15, 2020 11:16

Add tests for Datetime64 formatter

bb83864

Fix Datetime64Formatter to handle values with ndim > 1

460f33c

Fix formatting

39171e9

Add tests for Datetime64TZFormatter

3ba0925

Corrected tests

aac5f1b

Add tests and fix for Datetime64TZFormatter

14c64cd

Use DatetimeArray instead of DatetimeIndex

12dd233

Avoid calling ravel on Index

ec0aa15

Conditionally call ravel() if ndim > 1

6da6c68

Added whatsnew entry

da0f037

BryanCutler force-pushed the bug-ext-array-2d-datetime64-38390 branch from 75c381c to da0f037 Compare December 15, 2020 19:17

jbrockmendel reviewed Dec 17, 2020

View reviewed changes

pandas/io/formats/format.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Dec 17, 2020

View reviewed changes

pandas/io/formats/format.py Outdated Show resolved Hide resolved

Changed to use values.ndim > 1

4140453

jbrockmendel reviewed Dec 19, 2020

View reviewed changes

doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved

jbrockmendel reviewed Dec 19, 2020

View reviewed changes

Move whatsnew entry to 1.3.0

1563c4d

jreback requested changes Dec 22, 2020

View reviewed changes

jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Output-Formatting __repr__ of pandas objects, to_string labels Dec 22, 2020

github-actions bot added the Stale label Jan 28, 2021

BryanCutler mentioned this pull request Jan 28, 2021

Fixes for use with Pandas 1.2.1 CODAIT/text-extensions-for-pandas#171

Merged

jorisvandenbossche removed the Stale label Feb 3, 2021

github-actions bot added the Stale label Mar 7, 2021

BryanCutler closed this Mar 9, 2021

Uh oh!

[#38390] Bug fix for Datetime64Formatter with values of ndim > 1 #38391

[#38390] Bug fix for Datetime64Formatter with values of ndim > 1 #38391

Uh oh!

Conversation

BryanCutler commented Dec 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryanCutler commented Dec 15, 2020

Uh oh!

BryanCutler commented Dec 15, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 22, 2020

Uh oh!

BryanCutler commented Dec 28, 2020

Uh oh!

github-actions bot commented Jan 28, 2021

Uh oh!

BryanCutler commented Jan 28, 2021

Uh oh!

github-actions bot commented Mar 7, 2021

Uh oh!

BryanCutler commented Mar 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BryanCutler commented Dec 9, 2020 •

edited

Loading